Storage Fit Learning with Unlabeled Data
نویسندگان
چکیده
By using abundant unlabeled data, semi-supervised learning approaches have been found useful in various tasks. Existing approaches, however, neglect the fact that the storage available for the learning process is different under different situations, and thus, the learning approaches should be flexible subject to the storage budget limit. In this paper, we focus on graph-based semi-supervised learning and propose two storage fit learning approaches which can adjust their behaviors to different storage budgets. Specifically, we utilize techniques of low-rank matrix approximation to find a low-rank approximator of the similarity matrix to meet the storage budget. The first approach is based on stochastic optimization, which is an iterative approach that converges to the optimal low-rank approximator globally. The second approach is based on Nyström method, which can find a good low-rank approximator efficiently and is suitable for real-time applications. Experiments show that the proposed methods can fit adaptively different storage budgets and obtain good performances in different scenarios.
منابع مشابه
Estimate Unlabeled-Data-Distribution for Semi-supervised PU Learning
Traditional supervised classifiers use only labeled data (features/label pairs) as the training set, while the unlabeled data is used as the testing set. In practice, it is often the case that the labeled data is hard to obtain and the unlabeled data contains the instances that belong to the predefined class beyond the labeled data categories. This problem has been widely studied in recent year...
متن کاملWhat causes category-shifting in human semi-supervised learning?
In a categorization task involving both labeled and unlabeled data, it has been shown that humans make use of the underlying distribution of the unlabeled examples. It has also been shown that humans are sensitive to shifts in this distribution, and will change predicted classifications based on these shifts. It is not immediately obvious what causes these shifts – what specific properties of t...
متن کاملMulti-Label Classification with Unlabeled Data: An Inductive Approach
The problem of multi-label classification has attracted great interests in the last decade. Multi-label classification refers to the problems where an example that is represented by a single instance can be assigned tomore than one category. Until now, most of the researches on multi-label classification have focused on supervised settings whose assumption is that large amount of labeled traini...
متن کاملEfficient Computation and Model Selection in Semi-Supervised Learning
Traditional learning algorithm uses only labeled data for training. However, labeled examples are often difficult or time consuming to obtain since they require substantial labeling efforts from humans. On the other hand, unlabeled data are often relatively easy to collect. Semi-supervised learning addresses this problem by using large quantities of unlabeled data with the labeled data to build...
متن کاملData Dependant Learners Ensemble Pruning
Ensemble learning aims at combining several slightly different learners to construct stronger learner. Ensemble of a well selected subset of learners would outperform than ensemble of all. However, the well studied accuracy / diversity ensemble pruning framework would lead to over fit of training data, which results a target learner of relatively low generalization ability. We propose to ensemb...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017